Estimating the Quality of Translated User-Generated Content

نویسندگان

  • Raphaël Rubino
  • Jennifer Foster
  • Rasoul Samad Zadeh Kaljahi
  • Johann Roturier
  • Fred Hollowood
چکیده

Previous research on quality estimation for machine translation has demonstrated the possibility of predicting the translation quality of well-formed data. We present a first study on estimating the translation quality of user-generated content. Our dataset contains English technical forum comments which were translated into French by three automatic systems. These translations were rated in terms of both comprehensibility and fidelity by human annotators. Our experiments show that tried-and-tested quality estimation features work well on this type of data but that extending this set can be beneficial. We also show that the performance of particular types of features depends on the type of system used to produce the translation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Machine-Translated User Generated Content: A pilot study based on User Ratings

This paper presents the results of an experimental pilot user study, focusing on the evaluation of machine-translated user-generated content by users of an online community forum and how those users interact with the MT content that is presented to them. Preliminary results show that ratings are very difficult to obtain, that a low percentage of posts (21%) was rated, that users need to be well...

متن کامل

Semantic Tagging and Inference in Online Communities

In this paper we present UsTag, an approach for providing user defined semantics for user generated content (UGC) and process those semantics with user defined rules. User semantics is provided with a tagging mechanism extended in order to express relationships within the content. These relationships are translated to RDF triples. RDF triples along with user defined rules enable the creation of...

متن کامل

Exploring the Popularity, Reputation and Certification of User-Generated Software

User-Generated Content has reshaped the landscape of the Information Marketplace during the last years. Among the content, software is a very impacting class. In comparison to regular resource, the estimation of popularity, reputation and certification of user-generated software in distributed architecture appears to be a challenging task. Furthermore how to use popularity to help user to disco...

متن کامل

Maintaining Sentiment Polarity in Translation of User-Generated Content

The advent of social media has shaken the very foundations of how we share information, with Twitter, Facebook, and Linkedin among many well-known social networking platforms that facilitate information generation and distribution. However, the maximum 140-character restriction in Twitter encourages users to (sometimes deliberately) write somewhat informally in most cases. As a result, machine ...

متن کامل

Assessment of uncertainty for coal quality-tonnage curves through minimum spatial cross-correlation simulation

Coal quality-tonnage curves are helpful tools in optimum mine planning and can be estimated using geostatistical simulation methods. In the presence of spatially cross-correlated variables, traditional co-simulation methods are impractical and time consuming. This paper investigates a factor simulation approach based on minimization of spatial cross-correlations with the objective of modeling s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013